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Abstract 


We have developed and validated a statistical model to estimate the fugacity (or partial pressure) 
of carbon dioxide (CCE) at sea surface (pC02sea) from space-based observations of sea surface 
temperature (SST), chlorophyll, and salinity. More than a quarter million in situ measurements 
coincident with satellite data were compiled to train and validate the model. We have produced 
and made accessible 9 years (2002-2010) of the pC02sea at 0.5 degree resolutions daily over the 
global ocean. The data were used to reveal multi-year and regional variability of pC02sea in 
relation to ocean parameters. The data also identify uncertainties in the current JPL Carbon 
Monitoring System (CMS) model-based and bottom-up estimates over the ocean in the 
subtropical oligotrophic oceans where biological production is not a significant factor in 
pC02sea changes. 

1. Significance 

The alanningly rapid increase of global atmospheric carbon dioxide (CO 2 ) content has been well 
documented (e.g., Hofmann et al. 2009), but the distributions of surface sources and sinks have 
not been sufficiently known. NASA’s Orbiting Carbon Observatory (OCO) is designed to give a 
more accurate measurement of the column-integrated CO 2 content (Crisp et al. 2004) from which 
surface sources and sinks could be inferred. In the past, sparse atmospheric measurements were 
assimilated into atmospheric transport models, and, with an inverse technique, the surface 
sources and sinks were derived (e.g., Hein et al. 1996; Gumey et al. 2004; Patra et al. 2006; 

Yang et al. 2007; Engelen et al. 2009). However, the large sensitivity of flux inversion systems 
to regional biases and the large spread of model results were well known (Chevalier et al. 2005; 
Gurney et al. 2004). Significant deficiency and imbalance of the carbon cycle remain (e.g., 
Canadell et al. 2007; Le Quere et al. 2007; Watson et al. 2009). The ocean is an important natural 
sink for atmospheric CO 2 , and the estimation of ocean-atmosphere exchange in COo is critical 
for understanding and prediction of climate change. 

Ocean surface CO 2 fugacity is critical in quantifying the flux as described in Section 2. It is also 
the surface signature of ocean acidity, dynamics, and biogeochemistry. The monitoring and 
characterization of its variability are also important to ecology and economy. 

2. Bulk Parametrization 

The air-sea exchange in CO 2 (F in Equation (1)) is largely driven by turbulence and has been 
estimated through bulk parameterization (e.g., Takahashi et al. 2002). 


F = ka(ApC02) 


( 1 ) 


where k is the CO 2 gas transfer (piston) velocity, a is the solubility of CO 2 in seawater (Weiss 
1974), and (ApC02) is the difference between the partial pressure of COo in water (pC02sea), 
and that in air near the surface (pC02air). 
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In many studies, the fugacity fC02 is used in place of pC02 to distinguish real pressure instead 
of ideal pressure. For an ideal gas, f = p. The conversion of one to another requires knowledge of 
pressure, temperature, and concentration of C02 (xC02). The method is described by DOE 
(1994) handbook and the modification by Dickson et al. (2007), using empirical formula of 
Weiss (1974). The difference between fC02 and pC02 is generally negligibly small compared 
with the uncertainties of measurement accuracy. For fixed pressure at 1013 mb and xC02 of 
350 ppm, changing temperature from -5°C to 30°C, changes the differences between fC02 and 
pC02 by less than 0.2%. With fixed temperature at 25°C and xC02 = 350 ppm, changing p from 
1013 mb to 900 mb changes the differences between fC02 and pC02 by less than 0.3%. In this 
study, we do not distinguish between the two parameters. 

The modeling of the transfer velocity in tenn of wind speed has been extensively investigated, 
and the advantage of space-based measurement of wind speed in providing the needed temporal 
and spatial resolution has been demonstrated (e.g., Liss and Merilvat 1986; Watson et al. 1991; 
Wanninkhof 1992; Nightingale et al. 2000; Boutin et al. 2002, Carr et al. 2002). Other factors, 
such as surfactant (Frew 1997; Tsai and Liu 2004; Lin et al. 2003; Hashizume and Liu 2004), 
and bubbles (e.g., Woolf 1997) have also been studied. A few studies have also suggested that 
wind speed is not a sufficient parameter for k (e.g., Glover et al. 2002), and surface roughness 
and stress are better parameters. Glover et al. (2002) supported the theory of the dependence of k 
on wave slope using the specular backscatter of radar altimeter. The nadir-looking altimeters 
have limited coverage, and the application of Bragg backscatter measured by the scatterometers 
is an obvious alternative (Bogucki et al. 2010). With the new perspective of retrieving stress and 
roughness directly from scatterometer backscatter (e.g., Liu et al, 2010), the improvement of k 
estimation is being investigated under a complementary study. The pC02air is believed to 
change much less than pC02sea. It may be estimated through a combination of in situ 
measurements, satellite data, and numerical models, and it is explored under a separated study. 
The pC02sea has been measured largely on ships; they are not sufficient to characterize spatial 
and temporal variability. Observations from the vantage point of space may help, and such 
observations are the focus of this study. 

3. Traditional Methods and Deficiency 

Space-based sensors do not measure the flux or the fugacity directly. Attempts have been made 
to establish regional and seasonal relations between pC02sea and variables that are more readily 
measured. For example, Stephens et al. (1995) produced a statistical relation between pC02 and 
SST from nine cruises across the Pacific between 1984 and 1989. He concluded that the relation 
is sufficient to estimate pC02sea from satellite SST over the oligotrophic subtropical Pacific, but 
not over the eutrophic Northwest Pacific, with significant primary production. Many algorithms 
to related pC02sea to SST followed, in the Arabian Sea (Goyet et al., 1998), in the Greenland 
Sea (Hood et al., 1999), in the Sargasso Sea (Nelson et al. 2001), and in the equatorial Pacific 
(Cosca et al., 2003), but their applicabilities are limited by geographical region, season, and time 
scales, depending on the data used to develop the relation. The addition of Chl-a for input was 
used by Ono et al. (2004) for the subtropics and the subarctic separately, using shipboard 
measurements in the North Pacific. They found large errors in subarctic springtime. Sarma et al. 
(2006) used meridional transacts to build algorithms via dissolved inorganic carbon (DIG), using 
multi-variate linear regression. They computed basin-wide, monthly maps using satellite data, 
but they found large discrepancies in some regions of the ocean basin. Zhu et al. (2009) 
developed summer multiple polynomial regression with SST alone and with SST with Chl-a 
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together, using South China Sea cruise measurements during July 2004. Instead of developing 
the algorithm using only high frequency in-situ measurement, Padin et al. (2009) was the first 
one to regress in situ measurement of pC02sea with overpass satellite observations of SST and 
Chl-a, but this was only limited to cruises in the Bay of Biscay. Salinity and alkalinity are also 
related to pC02sea in some of these studies. These studies mostly use linear regression or 
multiple polynomials. 

Recently, the neural network approach has been applied to estimate pC02sea in the North 
Atlantic. Lefevre et al. (2005) compared two methods: neural network and linear regression in 
deriving the monthly distribution of pC02sea from SST, time, and location, with measurements 
of pC02sea in the Atlantic subpolar gyre (50-70°N, 60-10°W) from 1995 to 1997. The neural 
network approach has better accuracy with root-mean-square (RMS) error of 3-1 1 palm. 
Telszewski et al. (2009) derived the monthly pC02sea in the North Atlantic (10.5°N-75.5°N) by 
applying a self-organizing map neural network from SST, Chl-a, and mixed layer depth (MLD). 
The neural network was trained using underway measurements of pC02sea from 2004 to 2006 
and the input data. The RMS error is 1 1.6 uatm. Friedrich and Oschlies (2009) estimated and 
validated monthly pC02sea in the North Atlantic (15°N-65°N), using a self-organizing neural 
network based on a biogeochemical model generated pC02sea. The monthly pC02sea has an 
RMS error of 15.9 patm. 

In almost all studies, the relationships between pC02sea and other parameters are developed 
with co-incident measurements on cruises, mostly covering a limited region and a particular 
season. The correlation coefficients between climatological annual cycle of pC02sea and 
oceanic parameters change from positive to negative over various regions. A single universal 
linear or polynomial regression, as derived in these studies, would not work over the global 
ocean across all seasons. Multiple relations covering different regions and seasons would have 
strong boundary discontinuity problems. Support vector regression, with location and time 
(season) as input parameters, will address such problems, and a universal model has been 
established for continuous and global coverage. The seasonal and regional limitations of the 
relation between pC02sea and the “driver” parameters is demonstrated with our data product in 
Section 9. 

4. Support Vector Regression 

A ’’support vector machine” (SVM) is used to derive pC02sea using space-based observations. 
Xie et al. (2008) have demonstrated that SVMs outperfonn linear regression and neural network 
in estimating moisture advection by reducing the bias and the standard deviation in comparison 
with observations, and the results include more accurate extreme values. The method has several 
major applications. The SVMs for regression are referred as SVR, which is a statistical tool to 
derive the relationship between input and output. A comprehensive tutorial of SVR can be found 
in Smola and Scholkopf (2004). In the past decade, SVMs have become increasingly popular due 
to their broad applications. The approach is relatively easy to use, because there are only a few 
parameters to adjust. The simple setting of SVR, with the data training only based on support 
vectors, avoids over-fitting of the training data. By using the standard quadratic programming 
algorithms, only one global optimum is achieved. Mapping inputs into high-dimensional feature 
space and introducing kernal function can solve the nonlinear relationship between inputs and 
outputs by turning a nonlinear regression into a linear fitting. For the regression algorithms in 
this study, a large training data set is needed in order to represent global coverage with space and 
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time dependence. The accuracy of SVR depends on selection of the two hyper-parameters and 
the kernel parameter to optimize the retrieval algorithms (Xie et al. 2008). The initial values of 
the parameters are empirically estimated from the training data based on previous studies. Then 
only one parameter varies until the optimized correlation between the trained output and the 
target data are found. 


5. In Situ Measurements 

Tremendous effort has been put forward to synthesize consistent and quality-controlled pC02sea 
data sets. The database archived at the Lamont-Doherty Earth Observatory (LDEO) contains a 
large part of the pC02sea measurements, which have been contributed by many institutions from 
the U.S. and other countries. The latest version (version 2012, Takahashi et al. 2013) consists of 
approximately 6.7 million surface ocean pC02 observations made from 1957 to 2012. We have 
continuously checked and updated data sets from the continuous measurements by many U.S. 
and international projects and programs over the global ocean. They include the global Volunteer 
Observing Ship (VOS) project, the Global CCF Time-series and Moorings Project, the 
International Climate Variability Program (CLIVAR) Global Ocean Carbon and Repeat 
Hydrography Program, the Global Coastal Carbon DATA Project along the east/west coasts of 
North America and European coast, and the GLobal Ocean Data Analysis Project (GLODAP). 
Data collected from cruises over the Pacific are made available from PACIFic ocean Interior 
CArbon (PACIFICA). Ongoing cruise measurements of atmospheric and ocean pC02 are 
conducted by the CO 2 group of the Atlantic Oceanographic and Meteorological Laboratory 
(AOML) and the Pacific Marine Environmental Laboratory (PMEL). The LDEO database may 
partly overlap with the latest CARbon dioxide IN the Atlantic Ocean (CARINA) data synthesis 
project. CARINA includes data from 188 cruises over the Atlantic Ocean, the Arctic Ocean, and 
the Southern Ocean (Key et al. 2010). Data from the Pacific Ocean are being synthesized in the 
North Pacific Marine Science Organization (PICES) effort. These data sets are distributed 
through the Carbon Dioxide Information Analysis Center (CDIAC). Instead of pC02 or fC02, 
the CARINA data output is total dissolved inorganic carbon (TC02, Pierrot et al. 2010). The 
conversion of TC02 and pC02 follows the program developed for CO 2 systems by Lewis and 
Wallace (1998), along with alkalinity, temperature, salinity, and pressure. Recently, the Surface 
Ocean CO 2 Atlas (SOCAT, Pfeil et al. 2013) project puts together all publicly available 
underway pC02sea data from the global oceans between 1968 and 2007 with the 2nd level 
quality control. 


We were able to compile about 250,000 quality-controlled measurements between 2002 and 
2010, coincident with satellite measurements of SST and Chl-a, as shown in Fig. 1 This is a 
living data set, and we will continue to collect new data to fill up the data gaps. 
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Figure 1. Collocated pC02 se a measurements with satellite observations during 2002- 
2010. The pC02sea data came from SOCAT and all other sources that were compiled 
through the Carbon Dioxide Information Analysis Center (CDIAC). 

6. Related Space-based Data 

The Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E), on 
board NASA’s Aqua satellite, was launched in May 2002 and has been collecting global SST 
under clear and cloudy conditions. SST, averaged to 0.25° by 0.25° grids for ascending and 
descending paths (Wentz and Meissner 1999), was obtained from Remote Sensing System. We 
have uses SST from microwave sensors in this initial study because they are not obscured by 
clouds. 

Chl-a is derived from a combination of measurements by the Sea-viewing Wide Field-of-view 
Sensor (SeaWiFs) and the Moderate-Resolution Imaging Spectroradiometer (MODIS) on both 
Terra and Aqua. The daily Level 3 standard mapped image product has a spatial resolution of 9 
km. Because clouds, aerosols, and sunlight availability affect SeaWiFS and MODIS 
measurements, large data gaps exist on the daily maps. Both spatial and temporal 
smoothing/averaging will be applied to the Chl-a data to obtain a larger data set collocated with 
the in-situ pC02sea for training/validation. 

7. Statistical Model and Validation 

A statistical model has been developed to retrieve pC02sea from space-based observations using 
SVR. The training data are constructed as follows. The target data are combined daily averages 
computed from in situ pC02sea observations over global oceans as described in Section 5. Only 
data starting from 2002, with collocated satellite data have been used in developing the statistical 
model. The input data include satellite data descried in Section 6, which are daily averages of 
collocated SST from AMSR-E and Chl-a from SeaWiFS. Climatological sea surface salinity 
(SSS) data (Boyer et al. 2005) have also been used. The day of year, longitude, and latitude will 
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also be included in the training as input. Time and longitude are taken in the forms of sine and 
cosine because of their periodicity. The input parameters and the target data (x), except for time 

and longitude, are normalized as: x - ( x _ X V° ? where x and a are the mean and standard 
deviation of x. 

In the current version of the model, ocean MLD was added as an input facto. Operational output 
from the Global Ocean Data Assimilation System (GODAS) (Behringer 2007) was used. 

After we reserved a set of 40,000 data groups from the total data set described in Sections 5 and 
6 for validation, we randomly selected another 40,000 data groups to train the model. The 
validation is shown in Fig. 2. For the 40,000 data pairs at 0.5° and daily resolution the mean 
difference between model predictions and measurements is -0. 17 patm and the root-mean-square 
(RMS) difference is 16.37 patm; the latter is 6% of the data range of approximately 270 patm. 
Assuming 28 degrees of freedom, the RMS error of daily data is equivalent to 3.1 patm for a 
monthly mean. In actual practice, the decorrelation time scale would be longer than a day, and 
RMS error for monthly mean would be between 3.1 patm and 16.37 patm for a 3-day 
decorrelation time scale. 



pC02sea (observed) patm 


Figure 2. Bin-averaged pC02sea derived from our model plotted vs. observed pC02. 
40000 randomly selected observations for 2002-2010, independent from training data of 
the statistical model are used. Standard deviation is superimposed on each bin average 
as error bars. 

8. Characterizing the variability 

The seasonal and latitudinal variabilities along 148°E in North Pacific (the position is marked in 
Fig. 3e) of the 9-year mean pC02sea of our model output (Fig. 3a) agree well with Takahashi 
climatology (Fig. 3b). South of 34°N, pC02sea has high values in August and low values in 
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March-April, in phase with SST (Fig. 3d), but in opposite phase with Chl-a (Fig. 3c). North of 
this latitude, there are two lows, in April-May and in September-October. The lows correspond 
to high values of Chl-a. Stronger biological productivity takes up more CCK The significance of 
biological processes in the distribution of C02 is found at higher latitudes while physical- 
chemical process dominates in sub-tropical oceans. West of this longitude, there is no 
climatology data at extratropical latitudes, but there are two stations, KEO and JKEO south and 
north of 34°N with multi-years of measurements of pC02sea. Their locations are marked in Fig. 
3e. 

Figures 4 and 5 show that our output agrees well with in situ measurements, two years at KEO 
and one year at JKEO, that were accessible to us. Our data show year-to-year variation. At 
KEO to the south (Fig. 4), the annual variations of pC02sea agree well with SST, but they do not 
follow the semi-annual variations of Chl-a. At JKEO to the north, there are two cycles a year in 
opposite phase with Chl-a, but SST has only one cycle per year. 

9. Comparison with CMS product 

The current JPL CMS bottom-up flux estimate is a model-based (http://cmsflux.jpl.nasa.gov) 
effort to compute pC02sea by combining the Estimating the Circulation and Climate of the 
Ocean Phase II (ECC02) model (Marshall et al. 1997), which provides the time-evolving 
physical ocean state, and the Darwin model (Follows and Dutkiewicz 2011), which provides 
time-evolving ocean ecosystem variables. CMS produced a comparison of the two years of data 
(2009 and 2010) with coincident LDEO in situ measurement, as shown in Fig 7 of Brix et al. 
(2012). The scatter is very large, for the high temporal resolution data, but generally around the 
central line. The more obvious problem is at low observed values (below 320 ppm), where a 
large number of over-estimations (above 370 ppm) are found. The LDEO data are a subset of the 
in-situ data we collected as discussed in Section 5. A similar comparison with our data is 
provided in Fig. 6. 

The 2 years of data over the global ocean provided by CMS are collocated with our products and 
in-situ measurements. Figure 6 shows the results of the comparisons in the form of a bin- 
averaged scatterplot of 9606 daily data pairs. The lack of sensitivity of CMS product at low 
values is obvious in Fig. 6a, and this is in agreement with the overestimation at low values 
shown by Brix et al. (2012). Our products at the same locations and times show better agreement 
with the in-situ data, as shown in Fig. 6b. The root-mean-square (RMS) difference with in situ 
measurement is 52.45 patm for CMS product, higher than 29.82 palm for our product. The 
correlation coefficient of 0.45 for CMS is lower than the 0.85 for our product. 

The geographic distribution of the 2-year averages of our products (Fig. 7a) agree with 
Takahashi climatology (Fig 7c) better than CMS product (Fig. 7b). The overestimation (lack of 
sensitivity) of CMS products is found largely over the oligotrophic subtropical oceans, and this 
suggests that the Darwin model may be deficient where ocean biological productivity is not a 
significant driver of surface carbon fugacity, as demonstrated in Section 8. 
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Figure 3 Latitudinal-time variabilition along 148°E in North Pacific: (a) pC02sea derived 
from Takahashi et al. (2013) climatology data; (b) same as (a), but from statistical model 
using satellite data, averaged over the 2003-2010 period; (c) and (d) the same as (b), 
except for log(Chl-a) and SST, respectively; (e) Location of two stations and the 148° E 
longitude line. 





9 



2003 2004 2005 2006 2007 2008 2009 2010 


Figure 5. Time series of pC02sea from statistical model and observed at JKEO, 
compared with SST (top) and log(Chl-a) (bottom). 
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pC02sea (observed) /iatm 


Figure 6. Bin-average comparion with coincident in situ measurement of pC02sea for (a) 
the version 2 CMS product and (b) for data derived from the statistical model. A total of 
9606 collocated data points for the period 2009-2010 are used. Standard deviation in 
each bin average is superimposed as error bars. 
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Figure 7. Distribution of pC02sea (a) derived from satellite data using the statistical 
model, averaged for period 2009-2010. (b) same as (a), except from the version 2 of CMS 
products, (c) same as (b), except from Takahashi et al. (2012) climatological data. 
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10. Future Work 

The current data will be distributed through website http://airsea.jpl.nasa.lgov/seaflux. It is a 
living data set. The model will be changed as more data with better coverage become available. 
It will extend, as new satellite sensors replace the old. For example, AMSR-2 on Global Change 
Observation Mission 1 -Water (GCOM-W) has replaced AMSR-E. 

Aquarius is a satellite mission to measure sea surface salinity (SSS), which was launched in 201 1 
and covers Earth's surface once every 7 days. The SSS data have 1° resolution. Soil Moisture and 
Ocean Salinity (SMOS), launched in November 2009, is also providing global SSS 
measurements with 3 days revisits. Intercalibration and merging of SST products are conducted 
and coordinated by the Group for High Resolution Sea Surface Temperature (GHRSST) Project. 
We will use these products to extend the time series, when funding support becomes available. 
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12. Acronyms and Abbreviations 

a solubility of C02 in seawater 

AT difference in time 


AMSR-E Advanced Microwave Scanning Radiometer for EOS 
AOML Atlantic Oceanographic and Meteorological Laboratory 


CARINA CARbon dioxide IN the Atlantic Ocean 
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CDIAC Carbon Dioxide Information Analysis Center 

Chl-a chlorophyll a 

CLIVAR Climate Variability Program 

CMS (Jet Propulsion Laboratory) Carbon Monitoring System 

CO 2 carbon dioxide 

DIC dissolved inorganic carbon 

ECC02 Estimating the Circulation and Climate of the Ocean Phase II (model) 

EOS Earth Observing System 

f fugacity 

F air-sea exchange in C 02 

GCOM-W Global Change Observation Mission 1 -Water 

GHRSST Group for High Resolution Sea Surface Temperature 
GLODAP GLobal Ocean Data Analysis Project 

GODAS Global Ocean Data Assimilation System 

JKEO Japan Agency for Marine-Earth Science and Technology (JAMSTEC) Kuroshio 

Extension Observatory 
[The definition came from JGR at 

http://www.pmel.noaa.gov/people/cronin/articles/TomitaetalJGR10a.pdf 

JPL Jet Propulsion Laboratory 

k CO 2 gas transfer (piston) velocity 

KEO Kuroshio Extension Observatory 

LDEO Lamont-Doherty Earth Observatory 

MLD mixed layer depth 

OCO Orbiting Carbon Observatory 

p pressure 

PACIFICA PACIFic ocean Interior CArbon 

PICES North Pacific Marine Science Organisation [British spelling] 

PMEL Pacific Marine Environmental Laboratory 
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RMS 

root mean square 

SeaWiFs 

Sea-viewing Wide Field-of-view Sensor 

SMOS 

Soil Moisture and Ocean Salinity 

SOCAT 

Surface Ocean CO 2 Atlas 

sss 

sea surface salinity 

SST 

sea surface temperature 

SVM 

support vector machine 

SVM 

support vector machine used for regression 

TC02 

total dissolved inorganic carbon 

VOS 

Volunteer Observing Ship (project) 
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